A Multimodal Approach to Audiovisual Text-to-Speech Synthesis
نویسنده
چکیده
Oral speech has always been the most important means of communication between humans. When a message is conveyed using oral speech, it is encoded in two separate signals: an auditory speech signal and a visual speech signal. The auditory speech signal consists of a series of speech sounds that are produced by the human speech production system. In order to generate different sounds, the parameters of this speech production system are varied. Since some of the human articulators are visible to an observer (e.g., the lips, the teeth and the tongue), while uttering the speech sounds the variations of these visible articulators define the visual speech signal. It is well known that an optimal conveyance of the message is possible only when both the auditory and the visual speech signals are perceived by the receiver.
منابع مشابه
2D Audiovisual Text-to-Speech Synthesis for Human-Machine Interaction in Dutch
Speech has always been the most important means of communication between humans. Therefore, using speech in machine-human communication can help in increasing the naturalness of the communication between a computer system and a user. Systems that can make a machine pronounce any given input text are referred to as text-to-speech systems. To further enhance the communication, a talking head can ...
متن کاملMultimodal coherency issues in designing and optimizing audiovisual speech synthesis techniques
This paper proposes a 2D audiovisual text-to-speech synthesis system that constructs the output signal by selecting and concatenating multimodal segments containing natural combinations of audio and video. We describe the experiments that were conducted in order to assess the impact of this joint audio/video synthesis technique on the perceived quality of the synthetic speech. The experiments i...
متن کاملEvaluating a virtual speech cuer
This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then superimposed on demand and at the reception on the original broadcast as an alternative to subtitling. The paper presents...
متن کاملEvaluating a virtual
This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then superimposed on demand and at the reception on the original broadcast as an alternative to subtitling. The paper presents...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملA multimedia platform for audio-visual speech processing
In the framework of the European ESPRIT Project MIAMI ("Multimodal Integration for Advanced Multimedia Interfaces"), a platform has been developed at the ICP to study the various combinations of audiovisual speech processing, including real-time lip motion analysis, real-time synthesis of models of the lips and of the face, audiovisual speech recognition of isolated words, and text-to-audio-vis...
متن کامل